Variational Inference for Policy Search in changing situations

نویسنده

Gerhard Neumann

چکیده

Many policy search algorithms minimize the Kullback-Leibler (KL) divergence to a certain target distribution in order to fit their policy. The commonly used KL-divergence forces the resulting policy to be ’reward-attracted’. The policy tries to reproduce all positively rewarded experience while negative experience is neglected. However, the KL-divergence is not symmetric and we can also minimize the the reversed KL-divergence, which is typically used in variational inference. The policy now becomes ’cost-averse’. It tries to avoid reproducing any negatively-rewarded experience while maximizing exploration. Due to this ’cost-averseness’ of the policy, Variational Inference for Policy Search (VIP) has several interesting properties. It requires no kernelbandwith nor exploration rate, such settings are determined automatically by the inference. The algorithm meets the performance of state-of-theart methods while being applicable to simultaneously learning in multiple situations. We concentrate on using VIP for policy search in robotics. We apply our algorithm to learn dynamic counterbalancing of different kinds of pushes with human-like 2-link and 4-link robots.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Combine Monte Carlo with Exhaustive Search: Effective Variational Inference and Policy Gradient Reinforcement Learning

In this paper we discuss very preliminary work on how we can reduce the variance in black box variational inference based on a framework that combines Monte Carlo with exhaustive search. We also discuss how Monte Carlo and exhaustive search can be combined to deal with infinite dimensional discrete spaces. Our method builds upon and extends a recently proposed algorithm that constructs stochast...

متن کامل

Natural Gradients via the Variational Predictive Distribution

Variational inference transforms posterior inference into parametric optimization thereby enabling the use of latent variable models where it would otherwise be impractical. However, variational inference can be finicky when different variational parameters control variables that are strongly correlated under the model. Traditional natural gradients that use the variational approximation fail t...

متن کامل

Black-Box Policy Search with Probabilistic Programs

In this work we show how to represent policies as programs: that is, as stochastic simulators with tunable parameters. To learn the parameters of such policies we develop connections between black box variational inference and existing policy search approaches. We then explain how such learning can be implemented in a probabilistic programming system. Using our own novel implementation of such ...

متن کامل

Variational Action Selection for Influence Diagrams

Influence diagrams provide a compact way to represent problems of decision making under uncertainty. As the number of variables in the problem increases, computing exact expectations and making optimal decisions becomes computationally intractable. A new method of action selection is presented, based on variational approximate inference. A policy is approximated where high-probability actions u...

متن کامل

Variational probabilistic inference and the QMR - DT databaseTommi

We describe a variational approximation method for eecient inference in large-scale probabilistic models. Variational methods are deterministic procedures that provide approximations to marginal and conditional probabilities of interest. They provide alternatives to approximate inference methods based on stochastic sampling or search. We describe a variational approach to the problem of diagnos...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2011

Variational Inference for Policy Search in changing situations

نویسنده

چکیده

منابع مشابه

Combine Monte Carlo with Exhaustive Search: Effective Variational Inference and Policy Gradient Reinforcement Learning

Natural Gradients via the Variational Predictive Distribution

Black-Box Policy Search with Probabilistic Programs

Variational Action Selection for Influence Diagrams

Variational probabilistic inference and the QMR - DT databaseTommi

عنوان ژورنال:

اشتراک گذاری